This Notebook shows how to create the CPTs for the Student example from Koller & Friedman.
%run '_preamble.ipynb'
Python version: 3.8.10 The autoreload extension is already loaded. To reload it, use: %reload_ext autoreload available imports: import os import logging import pandas as pd import numpy as np connect to this kernel with: jupyter console --existing 9fa5c31c-e49b-417e-882b-5f4ace153127 Could not create logging directory "../logs" Logging to: "../logs/notebook.log" Current date/time: 11-06-2021, 21:27 Current working directory: "/Users/melle/software-development/thomas-master/notebooks"
from thomas.core.factors import CPT
from IPython.display import display, HTML
def subset(full_dict, keys):
"""Return a subset of a dict."""
return {k: full_dict[k] for k in keys}
# We're defining CPTs for multiple random variables. The dictionary
# `states` keeps track the states each variable can take on.
states = {
'I': ['i0', 'i1'],
'S': ['s0', 's1'],
'D': ['d0', 'd1'],
'G': ['g1', 'g2','g3'],
'L': ['l0', 'l1'],
}
# We'll store the CPTs in a dict, indexed by the name of the
# conditioned variable.
P = dict()
# Create the CPT (which isn't really conditional probabilities, but rather prior
# probabilities) for random variable I.
P['I'] = CPT(
[0.7, 0.3],
states=subset(states, ['I']),
description='Intelligence'
)
# Display the CPT for random variable 'I': intelligence. The variable's states
# are listed as columns.
P['I']
I | i0 | i1 |
---|---|---|
0.7 | 0.3 |
# Create the CPT for random variable 'S'. The probabilities for S are conditional
# on I. In other words, the CPT defines S given I which can be written as
# P(S|I).
P['S'] = CPT(
[0.95, 0.05,
0.20, 0.80],
states=subset(states, ['I', 'S']),
description='SAT Score'
)
# Display the CPT for random variable 'S': SAT Score. Again, the variable's
# states are listed as columns. The conditioning variables' states are listed
# as rows.
P['S']
S | s0 | s1 |
---|---|---|
I | ||
i0 | 0.95 | 0.05 |
i1 | 0.20 | 0.80 |
# Internally, P['S'] is essentially a multi-level factor
print(P['S'])
P(S|I) I S i0 s0 0.95 s1 0.05 i1 s0 0.20 s1 0.80 dtype: float64
# Create the remained of the CPTs
P['D'] = CPT(
[0.6, 0.4],
states=subset(states, ['D']),
description='Difficulty'
)
P['G'] = CPT(
[0.30, 0.40, 0.30,
0.05, 0.25, 0.70,
0.90, 0.08, 0.02,
0.50, 0.30, 0.20],
states=subset(states, ['I', 'D', 'G']),
description='Grade'
)
P['L'] = CPT(
[0.10, 0.90,
0.40, 0.60,
0.99, 0.01],
states=subset(states, ['G', 'L']),
description='Letter'
)
# There can, of course, be more than one conditioning variable
P['G']
G | g1 | g2 | g3 | |
---|---|---|---|---|
I | D | |||
i0 | d0 | 0.30 | 0.40 | 0.30 |
d1 | 0.05 | 0.25 | 0.70 | |
i1 | d0 | 0.90 | 0.08 | 0.02 |
d1 | 0.50 | 0.30 | 0.20 |
# The CPT can be accessed through the __getitem__ accessor:
P['I']['i0']
0.7
# The same goes for multi-level CPTs
P['S'].as_factor()
factor(I,S) I S i0 s0 0.95 s1 0.05 i1 s0 0.20 s1 0.80 dtype: float64